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1. INTRODUCTION 

Classification is a process of assigning an instance in the data to the class that it belongs to. 
Classification is included in many applications of pattern recognition, machine learning and data mining. 
The classification task becomes more challenging when dealing with large databases especially when the data 
keeps changing from time to time. Data stream is an example in which new information and classes are 
continuously added and updated. This causes the expansion of the size of the data and may reduce the 
accuracy of classifiers [1]. Robotic technology is another example of learning large data in which the data is 
continuously updating. Dealing with large and dynamic data may reduce the learning process of the robot to 
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perform certain tasks [2]. The challenge of learning big data is also addressed in the work of [3]. For 
example, in the Apache Hadoop system, processing and storage big data is challenging because of large 
number of configurations. 

A network based classifier like artificial neural network (ANN) is not suitable for large database 
because it is slow in training phase. This is due to the complexity of its structure that contains the number of 
nodes, hidden layers and activation function [4]. ANN is also not suitable to deal with dynamic database as it 
requires to be retrained whenever a change occurs or new data is being added [5]. Instead of using the 
traditional nueral network, some researchers used extreme learning machine (ELM) which 1s a type of neural 
network with only one hidden layer. The work of [6] showed that ELM performed good in term of accuracy 
and speed in the field of pattern recognition. The work of [7] compared ELM with support vector machine 
(SVM) and the traditional neural network and showed that ELM is faster and more accurate than both 
algorithms. However ELM still requires to be retrained whenever a change occurs in the data which makes it 
not suitable to be used with large and dynamic data. 

Distance based classifiers perform the classification by assigning the testing instance to the nearest 
training instance or prototype in the space. A simple classifier like first nearest neighbor (IstNN) represents a 
class density in the space using all training instances. IstNN is very accurate and fast in training phase even 
when dealing with large number of instances, but it suffers from curse of dimensionality and very slow in 
testing phase, which makes it impractical [8-11]. Many researchers attempted improving the speed and 
accuracy of IstNN [10-14]. Nearest subclass classifier (NSC) represents a class density in the space as 
multiple prototypes, where K-means clustering algorithm is typically used to find the prototypes [9]. NSC is 
faster than IstNN in the testing phase and it is very accurate if it gets correct choice of the prototypes 
number. However, NSC with k-means is not suitable for large data. The work of [15] showed that k-means is 
not suitable to be used with large data because it is an iterative algorithm and can be slow with big data. 
Nearest class mean (NCM) represents a class density in the space as one prototype, which is the center of the 
class. NCM is a special case of NSC that use one prototype. NCM tends to generalize well for the base 
configuration [16]. It is very fast in training and testing phases but not accurate especially when the class 
density in the space spreads wider [9], [11], [17], Support vector machine (SVM) represents a class density in 
the space as support vectors. The number of support vectors increases when dealing with large data using two 
approaches: one versus one and one versus rest. The work of [3] showed that SVM can provide good 
accuracy. However, SVM is slow in training phase especially when dealing with large data [17-20]. It also 
requires to be retrained whenever a change in the data occurs. 

Bayesian classifiers perform the classification by assigning the instance to the class with the highest 
posterior probability using Bayesian rule. Naive Bayes (NB) represents a class feature probability density 
using only a single Gaussian function. NB is a simple, fast and accurate classifier when the class distribution 
is Gaussian. However, its accuracy decreases if the class distribution is not Gaussian [8]. Due to its 
simplicity, speed and accuracy in large databases, NB has become a popular classifier in many applications 
[21-24]. The accuracy of the NB classifier is comparable to the ANN [25]. However, it is lesser than that of 
the SVM [26-28]. Gaussian Mixture Model Classifier (GMMC) represents a class feature probability density 
using Gaussian mixture model function. An iterative algorithm lke Expectation Maximization (EM) is 
normally used to estimate Gaussian mixture model (GMM) parameters [29-33]. GMMC is more accurate 
compared to NB but slower in training phase due to EM. Flexible Naive Bayes (FNB) represents a class 
feature probability density using kernel function [22], [34], [35]. Gaussian kernel function is normally used in 
FNB [36]. FNB with Gaussian kernel function is similar to GMMC in which both of them represent a class 
feature probability density as mixture of Gaussians. 

The main problem of classification is that it is difficult and challenging to have a classification 
algorithm that is fast and accurate when dealing with large number of instances. Reviewing the past researchs 
showed that the classification algorithms like ANN, SVM, IstNN, NSC and GMMC are accurate but slow 
when dealing with large number of instances. In additional to that these algorithms might require to be 
retrained when the classes changes. On the other hand, algorithms like NB and NCM are fast but not 
accurate. 

In our previous work, PHBayes which is a Bayesian algorithm represents a class feature probability 
density using histogram [37]. Compared with other algorithms, PHBayes is fast in both training and testing 
phases. It is also very accurate when dealing with large number of instances. In this paper, a new Flexible 
Partial Histogram Bayes learning algorithm (fPHBayes) is proposed which is an improvement of PHBayes. 
Compared with PHBayes in our previous work [37], the proposed fPHBayes is more accurate to deal with 
small and large number of instances, more flexible to the class probability distribution and requires fewer 
parameters to be considered. fPHBayes uses a probability distribution derived from smoothing the observed 
histogram and performs the classification using Bayesian rule. 
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Our contributions in this work is proposing a new supervised learning algorithm fPHBayes which is 
an accurate algorithm, able to work with small and large number of instances and flexible to the distribution 
of the class. fPHBayes is also compared with other standard and most related and recent learning algorithms 
considering real data from UCI database and synthetic data analysis. 

The rest of this paper is organized as follows: PHBayes and the proposed algorithm fPHBayes are 
explained in more details in the methodology section. Results and analysis are provided in the results and 
discussion section and the conclusion is provided in the conclusion section. 


2. RESEARCH METHOD 
2.1. Partial Histogram Bayes Learning (PHBAYES) 

PHBayes is a supervised probabilistic learning algorithm. It performs the classification by assigning 
a training instance, X, where X = (Xj,...,X,) represents a vector of a traming instance with n features, to the 
class, C, with the highest posterior probability, P(CIX), applying the Bayesian rule as in Equations (1) 
and (2). The traming phase of PHBayes consists of three steps: the generation of the observed histogram, 
estimation of noise level in the histogram and estimation of histogram probability. 


P(Cl 9-7 (1) 
P(CIX)=P(C)] [P(X,10) (2) 


The observed histogram, b = (bj,...,b,), 18 generated directly from the instances of the class. The 
value of each bin in the histogram, b;, represents the frequency of occurrence. The histogram resolution, r, is 
an important parameter in order to generate more accurate histogram compared to its own probability 
function. 

Figure | shows observed histograms from small and large number of instances generated from 
mixture of Gaussian with two Gaussian components. In general, increasing the number of instances improves 
the accuracy of the observed histogram. The estimation of the noise level in the histogram is important 
procedure toward reducing the error in the histogram especially when dealing with small number of 
instances. The probability of chance, p,,(b;,), of each bin in the histogram is calculated using De-moivre 
Laplace theorem by treating the number of instances, s, as trials [38]. According to De-moivre Laplace 
theorem, when the number of instances is large, the probability of chance can be represented as Gaussian 
distribution with mean, u., = sp and variance, >-n = spg, where p = 1/r and g = |-p. The probability of chance 
is given in Equation (3). Figure 2 shows De-moivre Laplace theorem applied on an observed histogram taken 
from six sided dice. Figure 3 shows how to estimate the level of noise in PHBayes for three steps of iteration. 
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Figure 1. Observed histograms (a) is probability density function with two Gaussians. (b) and (c) are 
observed histograms from small number of instances using 32 and 64 bins. (d) and (e) are observed 
histograms from large number of instances using 32 and 64 bins. 
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Figure 2. Calculating the probability of chance applied on the observed histogram of six sided dice. (a) The 
probability of chance. (b) The observed histogram of six sided dice. 
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Figure 3. Histogram level of noise estimation in PHBayes (hard approach). (a), (b) and (c) are three steps of 
iteration where the lower horizontal red line represents the mean of the chance or noise,y,,and the upper 
horizontal black line represents the thresholdfy. 


After determining the initial u., and > .,, an iteration procedure is performed to estimate the 
histogram level of noise. In each step of iteration, the instances of each bin, b; with low probability of chance 
are removed or subtracted from the total number of instances, s, resulting in the remaining instances or the 
instances that occur by chance, s,,, as given in Equation (4), where the upper frequency, fy, is a threshold 
controlled by £ parameter as in Equation (5). As a result of that, u., and >, are also updated in each step of 
iteration from the value of s,,. This iterative procedure stops when it converges. After it converges, the 
histogram level of noise and its variance are the latest update of u,, and >\-y. In Figure 3, the upper horizontal 
dark line represents fy and the lower horizontal red line represents u,, while f, at zero. This approach is the 
hard thresholding approach. In each step of iteration, the parts of the histogram that are higher than the 
threshold fy are removed. 
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The histogram probability is estimated by smoothing the histogram bins b,; that have high 
probability of chance p,,(b;,)around the level of noise u,;then rescaling it by the number of instances s. There 
are two conditions in the histogram smoothing. First, if the probability of chance of a bin is lower than the 
threshold p.z(b,)</, then no smoothing process occur. In this case, the bin of the histogram is considered to 
be reliable. Second, if the probability of chance of the bin is higher than the threshold, p-,;,(b,)=f, then the bin 
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of the histogram is not considered to be reliable and it is rescaled around the level of noise. The parameter a 
is used to control the scale of smoothing. The histogram probability, p(b), is calculated by rescaling the 
smoothed histogram by the number of instances. The smoothing and probability estimation is calculated 
using Equation (6). Figure 4 shows how to smooth the histogram and calculate its probability in PHBayes. 
Here, only the bins between fyand f,are smoothed around [U¢p. 


= b, /s Pi (i) <B 6 
rtO)=\(a4n, — Ly, )+ La, )f8 » Pen (b,)2= 8 


The testing phase of PHBayes is done by assigning the instance X to the class C with the highest 
posterior probability P(CIX) using the Bayesian rule as in Equation (7), where k is the nearest bin to 
value of X;. 
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Figure 4. Histogram smoothing (a) and probability estimation (b) in PHBayes. The lower horizontal red line 
represents the mean of the chance or noise w,,and the upper horizontal black line represents the threshold/y. 


The advantages of PHBayes are that it is fast both in traming and testing phases. It is accurate 
especially when the number of instances increase as the error in the probability estimation reduces. It has low 
memory requirement since it only save bins of the histogram and probability for each class. It is also flexible 
to the shape and distribution of class by using histogram probability and is not required to be retrained 
although there are changes in the data and classes. 

The limitations of PHBayes is that its performance is highly dependent on the values of the 
parameters a, / and r. Using different values of these parameters may lead to different classification results. 
For example, using small r is good to estimate the histogram probability with small number of instances 
while using large r is better with large number of instances. PHBayes is also not very accurate when dealing 
with small number of instances or if the distribution of the class spread wider. This is because in these cases, 
estimation of the histogram probability becomes difficult. These limitations reduce the estimation accuracy of 
the histogram probability. 


2.2. Flexible Partial Histogram Bayes Learning (fPHBayes ) 

In this paper, a supervised flexible partial histogram Bayes learning algorithm (fPHBayes) 1s 
proposed to overcome some of the limitations of the original PHBayes. Compared with PHBayes, fPHBayes 
requires fewer parameters to consider and its performance does not depend on the parameters a and f. It is 
also suitable to work with large and small number of instances using higher histogram resolution. This 
property helps to increase the accuracy of fPHBayes. 

Similar to PHBayes, the training phase of fPHBayes also consists of the three steps: the generation 
of the observed histogram, estimation of the level of noise in the histogram and estimation of the histogram 
probability. The procedure of generating the observed histogram in fPHBayes is similar to that in PHBayes 
with the exception that fPHBayes uses higher histogram resolution. 
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The procedure of estimating the level of noise in the histogram in fPHBayes is improved compare 
with PHBayes, where a soft approach is used instead of the hard one. The mean, y,, and variance, >. are 
calculated from the probability of chance in each step of iteration and the remaining instances, Ss,,is 
calculated using Equation (8). Compared with the calculation of the s,, in the PHBayes that uses hard 
threshold fy which is controlled by the parameter /, the calculation of the s,,in the fPHBayes as in 
Equation (8) uses soft approach which does not have fyor f, and not controlled by the parameter /. In another 
word, to find the remaining instances s,,, the reliable and non-noisy parts of all b,are subtracted from s. 
The non-noisy part for each b,;, is the multiplication of the inverse probability of chance for the bin (1 - 
Pcn(b;)) by the difference between b,and uy. This is only applicable when b,;>u,,. As a result of that, in each 
step of iteration, u.,and > .,are updated based on the value of s.,. This procedure stops when it converges. 
The histogram level of noise and its variance are the latest update of u,,and > ¢p. 


Sch = 5 — >[(i- Pen (b, )) x(b, ~ Hen ) ID. > Ly | (8) 


Figure 5 shows how to estimate the level of noise in fPHBayes for three steps of iteration. The soft 
approach in fPHBayes affects all the parts in the histogram that are higher than the level of chance or noise 
Ucn. The soft approach does not have fixed threshold like fyas in the hard approach. Instead of that, each bin 
in the histogram that has a value higher than the level of chance is effected differently based on the 
probability of chance of the bin p,,(b;). For example in the first step of iteration, the bin number 7, b7 with the 
lowest probability of chance is the most effected bin by the soft approach. On the other hand, the bin b2, with 
high probability of chance is mostly not affected by the soft approach in the first step of iteration. The second 
and the third steps of iteration also show that when the level of chance u,,reduces after removing the non- 
noisy parts of the histogram, more parts of the histogram are affected because their probabilities of chance 
reduce. This is because the soft approach affects each bin based on its probability of chance. As a result of 
that the parameter / is not required. 
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Figure 5. Histogram level of noise estimation in fPHBayes (soft approach). (a), (b) and (c) are three steps of 
iteration where the horizontal red line represents the mean of the chance Up. 


The histogram probability estimation in fPHBayes is also improved compared with PHBayes. In 
fPHBayse, each bin in the observed histogram, b;is smoothed by redistributing the part that is higher than the 
level of chance or noise (b;-u-,) among the neighboring bins b;, where 1 <j <r. The redistribution of the bin 
b,among the neighboring bin b;, G(b;, b;) is following Gaussian distribution NG, ug, >.G(b;)) scaled by (bx - 
Ucn)as In the Equation (9). The mean of the smoothing or redistribution, ug, is the histogram bin number, k, 
and the variance of the redistribution, )}c, depends on the probability of chance of bin p,,(b;) and can be 
calculated using Equation (10). Therefore, the smoothed value of the bin, &b,;) is the summation of the 
redistribution of all neighboring bins G(b;, b;,)plus the level of noise u,;,as in the Equation (11). Finally and in 
order to get 100% probability distribution, the histogram probability, p(b), is estimated by rescaling the 
smoothed histogram, &(b), by the number of instances s as in Equation (12). This means the higher the 
probability of chance the bin has, the larger the variance of smoothing it has. Here, the parameter a is not 
required as each bin has a different variance of smoothing based on its p,,(b;,). This makes sure that only the 
noisy part of the histogram is smoothed and based on the amount of noise it has. This allows fPHBayes to 
work with small and large number of instances using larger histogram resolution compared with PHBayes. 
Figure 6 shows how to smooth the histogram and calculate its probability in fPHBayes. Here as an example 
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in Figure 6, the highlighted bins b7 and b2, have different variance of smoothing represented by black lines. 
Since bzhas lower probability of chance than b,, its instances are redistributed in smaller area compared with 
b>,. Finally, the probability is estimated by redistributing all bins in the histogram. 


G(b,.b,)=N (je 2o (B))*( — Hs) (9) 

oe ee (10) 
s(1=p,, (b, )) 

A(b,) = ty, + G(bj.b) (11) 

P(b.) =9(b )/s (12) 


Similar to PHBayes, the testing phase in fPHBayes is done by assigning the instance X to the class C 
with highest posterior probability P(CIX) using the Bayesian rule as in Equation (7). Figure 7 shows the 
histogram probability estimation of both PHBayes and fPHBayes using small and large number of instances. 
It is clear that both PHBayes and fPHBayes estimate the histogram probability better when the number of 
instances is large with respect to the histogram resolution. However when the number of instances is small, 
the estimation of fPHBayes is better than PHBayes. This is because the smoothing part in fPHBayes is 
improved. It tends to redistribute only bins with high probability of chance among other bins. This 
improvement makes sure that smoothing process only applies on the part that is not reliable in the histogram. 
This allows fPHBayes to work with large and small number of instances and with large and small variances 
using large histogram resolution. 
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Figure 6. Histogram smoothing and probability estimation in fPHBayes. (a), (b) and (c) are the steps to 
smooth the histogram bins b7andb>,. (d) The probability estimation by smoothing all bins 
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Figure 7. Probability estimation of PHBayes and fPHBayes. (a), (b) and (c) represent the probability 
estimation of PHBayes(32), fPHBayes(32) and fPHBayes(64) with small number of instances. (d), (e) and (f) 
represent the probability estimation of PHBayes(32), fPHBayes(32), fPHBayes(64) with large number of 
instances 


3. RESULTS AND DISCUSSION 
3.1. Results of Synthetic Data 

In synthetic data experiments, two factors that affect the accuracy of the classifiers were considered. 
These factors were the number of training instances and variance of Gaussians. These factors were selected 
for these experiments to show the improvement in the proposed fPHBayes compared with PHBayes. 
Different classifiers were considered for the experiments. They were 1°'NN, NCM, NSC with 5 components, 
GMMC with 5 components, PHbayes with r=16, PHBayes(16), PHBayes with r=32, PHBayes(32), PHBayes 
with 7=48, PHBayes(48) and the proposed algorithm with r=64, fPHBayes(64). The values for a and / of the 
PHBayes were set to a=0.5 and /=0.005 [37]. The reason of using 64 bins of histogram in fPHBayes is to 
allow it to cover all ranges of histogram lower than 64 bins. This means fPHBayes with 64 bins of histogram 
is flexible to use any histogram lower than 64 bins. These different classifiers were selected for the test 
because they are standards and recent algorithms and used in many reseearchs. For example NSC was used in 
the works of [11, 17], GMMC was used in the work of [32, 33], and our previous PHBayes was 
proposed [37]. It is also important to mention that NCM is a special case of NSC in which a single prototype 
is used to represent aclass density and NB is a special case of GMMC in which a single Gaussian component 
is used. The generation of the synthetic data was based on the representation of probability density functions 
of features of the classes as mixture of Gaussians. For all the classes, instances were derived from the 
probability density function and the number of training instance was the same as number of testing instances. 
The default value of the number of training instances per class was 300 and for the variance of Gaussians was 
around 0.1. Ten classes were used in each experiment. Each class had 16 features (8 independents and 8 
dependents ). Figure 8 shows the results of synthetic data experiments. 

Figure 8a shows an analysis on the impact of the number of training instance as one of the factors 
that affects the classification accuracy. Number of training instance was varied from 60 to 540. The values 
for other factors were fixed at their respective default values. In general, the accuracy of all the classifiers 
increased with the addition of number of instance. However, the performance rate of the classifiers was at the 
different rate. The accuracy of the NB and NCM was less affected with the addition of the number of 
instance. PHBayes produced significant improvement as the number of instance increased. Large number of 
instance helped to improve the estimation of the histogram in the PHBayes. fPHBayes was very accurate 
when the number of instances was small as well as when the number of instances is large compared with 
PHBayes. 

Figure 8b shows the results of variance of Gaussian that ranges from 0.02 to 0.18. Other factors 
were set to their default values. The increment of the variance typically reduced the classification accuracy of 
all the techniques. Compared to the other classifiers, the declining rate of the accuracy of the PHBayes was 
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more apparent because the class probability density was spread wider. fPHBayes was less effected by this 
factor than PHBayes. 
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Figure 8. Accuracies results (a) Results from varying the number of training instances (b) Results from 
varying the variances of Gaussians (c) Legends 


3.2. Resuts of Real Data 

In real data experiments, twenty different databases from UCI machine learning database 
(http ://www.ics.uci.edu/~mlearn/M LRepository.html) containmg more than 50,000 instances from different 
classes were used. The database is presented in Table 1. Each database contains different classes, features, 
and instances. In order to standardize the analysis for all the classifiers, the features were scaled equally and 
nominal feature were ignored. About half of the instances were randomly selected as training instances and 
the other half were used for testing. fPHBayes was compared with other classifiers, 1°-NN, NSC, NCM, NB, 
GMMC and PHBayes. 

Table 2 presents the accuracy results of the classifiers on the databases. In general, fPHBayes 
produced high accurate results in 12 of the 20 databases (60%) as compared to other classifiers. fPHBayes 
was the most accurate classifier with average accuracy 0.7888. Compared with PHBayes, fPHBayes is the 
winner for 16 out of 20 of the databases. 

The average computational time using the 20 databases is shown in Table 3. The time of fPHBayes 
was comparable to the fastest algorithm PHBayes in the testing phase, but slower in the training phase. 


Table 1. Twenty UCI databases used in the analyses 


No. Database name No. of instance No.offeatures No. of classes 
if Abalone 4177 8 3 
2 Balance-scale 625 4 3 
3 Blood Transfusion Service Center 748 4 2 
4 Car evaluation 1728 6 4 
5 Connectionist Bench (Sonar, 208 60 2 
6 Contraceptive Method Choice 1473 9 3 
7 Glass identification 214 9 6 
8 Haberman's survival 306 3 2 
9 Hayes-Roth 132 =) 3 
10 Hepatitis 155 19 2 
11 Ionosphere 351 34 2 
12 Iris 150 4 3 
13 MAGIC Gamma Telescope 19020 10 2 
14 Mammographic Mass 961 ) 2 
15 Nursery 12960 8 5 
16 Pima Indians Diabetes 768 8 2 
17 Spambase 4601 57 2 
18 Statlog (Image Segmentation) 2310 19 7 
19 Wine 178 13 3 
20 Wisc. Breast Cancer (Diagnostic) 699 10 2 
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Table 2. Accuracy results 


Y PHBayes PHBayes PHBayes  fPHBayes 
No. 1” NN NCM NSC NB GMMC (16) (32) (48) (64) 
if 0.4756 0.3987 0.4158 0.4400 0.3574 0.4993 0.4938 0.4961 0.5036 
2 0.7994 0.7340 0.6875 0.8782 0.8750 0.7785 0.7785 0.7785 0.8772 
3 0.7064 0.6631 0.5992 0.7644 0.7468 0.7535 0.7591 0.7535 0.7623 
4 0.9060 0.7187 0.7141 0.6858 0.7541 0.8010 0.8010 0.8010 0.8135 
5 0.9346 0.9923 0.9481 0.9865 0.9865 0.9231 0.9635 0.9394 0.9942 
6 0.4359 0.4359 0.4088 0.4734 0.3971 0.5075 0.5026 0.5001 0.5043 
7 0.6168 0.5673 0.6037 0.5869 0.7449 0.7598 0.6402 0.6047 0.8065 
8 0.7085 0.7438 0.5699 0.7471 0.7392 0.7444 0.7216 0.7157 0.7608 
9 0.5485 0.5273 0.4970 0.6152 0.6182 0.5318 0.5242 0.5242 0.5576 
10 0.6519 0.7013 0.6714 0.6818 0.7143 0.8026 0.7766 0.7390 0.8117 
11 0.8097 0.7743 0.7034 0.7663 0.7926 0.8526 0.8589 0.8606 0.7817 
12 0.8867 0.4813 0.8080 0.5213 0.8520 0.8093 0.7493 0.7960 0.6640 
13 0.9935 0.9053 0.9171 0.9087 0.9687 0.9665 0.9927 0.9853 0.9950 
14 0.7558 0.7913 0.7527 0.8127 0.8285 0.8358 0.8327 0.8335 0.8340 
15 0.8889 0.8200 0.7869 0.7206 0.7744 0.8664 0.8664 0.8664 0.8717 
16 0.6411 0.6602 0.6331 0.6839 0.6859 0.6828 0.6690 0.6667 0.6909 
17 0.9203 0.8234 0.6483 0.7122 0.7956 0.9370 0.9384 0.9333 0.9438 
18 0.4423 0.4472 0.4239 0.3548 0.6072 0.6416 0.7106 0.7375 0.6523 
19 0.7101 0.7730 0.7416 0.8393 0.8146 0.7247 0.7494 0.7596 0.9753 
20 0.9613 0.9676 0.8914 0.9610 0.9642 0.9736 0.9736 0.9736 0.9759 
nore 0.7397 0.6963 0.6711 0.7070 0.7509 0.7696 0.7651 0.7632 0.7888 
Average 
Table 3. Training and testing times on the 20 databases 
Algorithm Training Time (s) Testing Time (s) 
I“NN --- 354.3157 

NCM 1.7284 0.2485 

NSC 22 2335 1.1067 

NB 3.8252 0.5948 

GMMC 202.8376 3.5611 

PHBayes(16) 1.5672 0.1299 

PHBayes(32) 1.8040 0.1277 

PHBayes(48) 1.9970 0.1308 

fPHBayes(64) 4.8354 0.1315 


3.3. Speed and Memory Analyses 

To analysis the speed and memory requirement for fPHBayes and PHBayes, it is important to look 
at their structures. The training phase in both algorithms consists of three steps: the observed histogram 
building, histogram level of noise estimation and histogram probability estimation. Both algorithms build the 
observed histogram the same way, thus the required time of building the observed histogramis snt where T is 
unit of time. Both soft approach in fPHBayes and hard approach in PHBayes require the same number of 
operands to estimate the histogram level of noise. As a result, the required time to estimate the histogram 
level of noise in both algorithms is /rnt where / is the number of iterations. Smoothing process and estimating 
the histogram probability in fPHBayes is slower than PHBayes since it redistributes the noisy instances of the 
bins among neighboring bins. This makes fPHBayes takes longer time equal to rent compared with only rnt 
in PHBayes. The total required times for fPHBayes and PHBayes are (stlrtr’)nt and (s+/r+r)nt 
respectively. In the testing phase, both algorithms assign a testing instance to the class with the highest 
posterior probability applying Bayesian rule. Because of both algorithms use histogram probability, their 
total testing time is only nt. The memory size of both algorithms is 27nM. This is in order to save the 
observed histogram nM plus the histogram probability rnM, where M is unit of memory. 


3.4. Discussion 

Our contributions in this work is proposing a new supervised learning algorithm fPHBayes which is 
an accurate algorithm, able to work with small and large number of instances and flexible to the distribution 
of the class. fPHBayes is also compared with other standard and most related and recent learning algorithms 
considering real data from UCI database and synthetic data analysis. 

fPHBayes demonstrated more accurate results than other tested algorithms in most of the 
experiments. It is able to work with small as well as large number of instances and flexible to the class 
distribution. fPHBayes is a Bayesian algorithm which does not fall into curse of dimensionality. It is also 
important to mention than by using histogram probability, it is easy to handle the changes in the data. This is 
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because it is easy to update the histogram probability compared with retraining like in SVM, ANN, GMMC 
and NSC. 

Speed and memory analysis demonstrated that fPHBayes is fast in training and testing phases and 
requires small memory. Speed analysis showed that the iteration part of the traming phase does not depend 
on the number of instances for histogram algorithms which make them very fast in training phase. It also 
showed that the speed of these algorithms does not depend on the number of instances in the testing phase. 
This makes them have the fastest speed in testing phase compared with other tested algorithms. Results 
showed that histogram algorithms are very fast in training phase and the fastest in testing phase compared 
with NCM, NSC, KNN, NB and GMMC. Memory analysis showed that the memory requirement of 
histogram algorithms does not depend on the number of instances but only on histogram resolution, which is 
very small. 

Limitation of fPHBayes can be inherited from the limitation of Bayesian algorithms. All Bayesian 
algorithms like NB, GMMC, FNB, PHBayes and fPHBayes assume that all features are independents which 
may lead to lost in the accuracy if the features become more dependent. 

In general, fPHBayes showed very good accuracy, remarkable speed and small memory requirement 
with the flexibility to handle changes. With these characteristics, it is expected to use this algorithm in the 
applications that require continuous learning like humanoid robots and expert systems. Future work will 
focus on the accuracy and how to improve it in the case of working with dependent features. 


4. CONCLUSION 

In this paper, fPHBayes is proposed to work fast and accurate with large and small data and to 
cover some limitations in our previous work PHBayes. Compared with other tested algorithms, fPHBayes is 
more accurate and more flexible to handle different number of instances with different variances of 
Gaussians. Compared with PHBayes, the level of noise estimation and the probability estimation in 
fPHBayes is improved. fPHBayes does not require the parameters a and # as it applies soft approach and 
redistributes the instances based on the probability of chance. These features allow fPHBayes to work with 
higher resolution and provide betteraccuracy than PHBayes. The speed and memory analysis showed that the 
speed and memory of fPHBayes do not depend on the number of instances. The experiments on synthetic and 
real data showed that fPHBayes provided the most accurate results compared with PHBayes, IstNN, NSC, 
NCM, NB and GMMC. 
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